Fast Pattern Matching in Strings

نویسندگان

Donald E. Knuth

James H. Morris

Vaughan R. Pratt

چکیده

An algorithm is presented which finds all occurrences of one. given string within another, in running time proportional to the sum of the lengths of the strings. The constant of proportionality is low enough to make this algorithm of practical use, and the procedure can also be extended to deal with some more general pattern-matching problems. A theoretical application of the algorithm shows that the set of concatenations of even palindromes, i.e., the language {can}*, can be recognized in linear time. Other algorithms which run even faster on the average are also considered. Text-editing programs are often required to search through a string of characters looking for instances of a given "pattern" string; we wish to find all positions, or perhaps only the leftmost position, in which the pattern occurs as a contiguous substring of the text. For example, c a e n a r y contains the pattern e n, but we do not regard c a n a r y as a substring. The obvious way to search for a matching pattern is to try searching at every starting position of the text, abandoning the search as soon as an incorrect character is found. But this approach can be very inefficient, for example when we are looking for an occurrence of aaaaaaab in aaaaaaaaaaaaaab. When the pattern is a"b and the text is a2"b, we will find ourselves making (n + 1) comparisons of characters. Furthermore, the traditional approach involves "backing up" the input text as we go through it, and this can add annoying complications when we consider the buffering operations that are frequently involved. In this paper we describe a pattern-matching algorithm which finds all occurrences of a pattern of length rn within a text of length n in O(rn + n) units of time, without "backing up" the input text. The algorithm needs only O(m) locations of internal memory if the text is read from an external file, and only O(log m) units of time elapse between consecutive single-character inputs. All of the constants of proportionality implied by these "O" formulas are independent of the alphabet size.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Flexible and Efficient Algorithms for Abelian Matching in Strings

The abelian pattern matching problem consists in finding all substrings of a text which are permutations of a given pattern. This problem finds application in many areas and can be solved in linear time by a näıve sliding window approach. In this short communication we present a new class of algorithms based on a new efficient fingerprint computation approach, called Heap-Counting, which turns ...

متن کامل

An Adaptive Hybrid Pattern-Matching Algorithm on Indeterminate Strings

We describe a hybrid pattern-matching algorithm that works on both regular and indeterminate strings. This algorithm is inspired by the recently proposed hybrid algorithm FJS [11] and its indeterminate successor [15]. However, as discussed in this paper, because of the special properties of indeterminate strings, it is not straightforward to directly migrate FJS to an indeterminate version. Our...

متن کامل

Fast pattern-matching on indeterminate strings

In a string x on an alphabet Σ, a position i is said to be indeterminate iff x[i] may be any one of a specified subset {λ1, λ2, . . . , λj} of Σ, 2 ≤ j ≤ |Σ|. A string x containing indeterminate positions is therefore also said to be indeterminate. Indeterminate strings can arise in DNA and amino acid sequences as well as in cryptological applications and the analysis of musical texts. In this ...

متن کامل

Abelian pattern matching in strings

Abelian pattern matching is a new class of pattern matching problems. In abelian patterns, the order of the characters in the substrings does not matter, e.g. the strings abbc and babc represent the same abelian pattern a+2b+c. Therefore, unlike classical pattern matching, we do not look for an exact (ordered) occurrence of a substring, rather the aim here is to find any permutation of a given ...

متن کامل

Filtration Algorithms for Approximate Order-Preserving Matching

The exact order-preserving matching problem is to find all the substrings of a text T which have the same length and relative order as a pattern P . Like string maching, order-preserving matching can be generalized by allowing the match to be approximate. In approximate order-preserving matching two strings match if they have the same relative order after removing up to k elements in the same p...

متن کامل

A Highly Parallel Finite State Automaton Processor for Biological Pattern Matching

Finite State Automata are useful for string searching problems mostly because they are fast. For very large problems, a software implementation will not be fast enough. I describe here a parallel implementation of a hardware Deterministic Finite State Automaton processor. It can rapidly search a large database for approximately matching strings, as a lter for more detailed processing later. As ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

SIAM J. Comput.

دوره 6 شماره

صفحات -

تاریخ انتشار 1977

Fast Pattern Matching in Strings

نویسندگان

چکیده

منابع مشابه

Flexible and Efficient Algorithms for Abelian Matching in Strings

An Adaptive Hybrid Pattern-Matching Algorithm on Indeterminate Strings

Fast pattern-matching on indeterminate strings

Abelian pattern matching in strings

Filtration Algorithms for Approximate Order-Preserving Matching

A Highly Parallel Finite State Automaton Processor for Biological Pattern Matching

عنوان ژورنال:

اشتراک گذاری